Decoupled Vector-Fetch Architecture with a Scalarizing Compiler
نویسنده
چکیده
Decoupled Vector-Fetch Architecture with a Scalarizing Compiler
منابع مشابه
A Decoupled Fetch-Execute Engine with Static Branch Prediction Support
We describe a method for supporting static branch prediction on a decoupled fetch-execute pipeline. Using instruction buffers to decouple instruction fetch from the execute pipeline is an effective way to minimize instruction cache penalties by allowing instruction fetch and stall miss handling to proceed independent of the execution pipeline. Dynamic branch prediction is typically used with su...
متن کاملCompiler Generated Multithreading to Alleviate Memory Latency
Since the era of vector and pipelined computing, the computational speed is limited by the memory access time. Faster caches and more cache levels are used to bridge the growing gap between the memory and processor speeds. With the advent of multithreaded processors, it becomes feasible to concurrently fetch data and compute in two cooperating threads. A technique is presented to generate these...
متن کاملPerformance of the decoupled ACRI-1 architecture: the perfect club
This paper examines the performance potential of decoupled computer architectures on real-world codes, and includes the rst performance bounds calculations to be published for the highly-decoupled ACRI-1 computer architecture. It also constitutes the rst published work to report on the eeectiveness of a decoupling Fortran90 compiler. Decoupling is an architectural optimisation which ooers very ...
متن کاملTolerating Branch Predictor Latency on SMT
Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with...
متن کاملIncreasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures - Microarchitecture, 1996., IEEE/ACM International Symposium on
To exploit larger amounts of instruction level parallelism, processors are being built with wider issue widths and larger numbers offunctional units. Instruction fetch rate must also be increased in order to effectively exploit the performance potential of such processors. Block-structured ISAs provide an effective means of increasing the instruction fetch rate. We define an optimization, calle...
متن کامل